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Abstract 

This paper explores a PAC (probably approximately correct) learning 
model in cooperative games. Specifically, we are given m random sam¬ 
ples of coalitions and their values, taken from some unknown cooperative 
game; can we predict the values of unseen coalitions? We study the PAC 
learnability of several well-known classes of cooperative games, such as 
network flow games, threshold task games, and induced subgraph games. 
We also establish a novel connection between PAC learnability and core 
stability: for games that are efficiently learnable, it is possible to find 
payoff divisions that are likely to be stable using a polynomial number of 
samples. 


1 Introduction 


Cooperative game theory studies the following model. We are given a set of 


players N = {I,, n}, and v : 2 


N 


—¥ . 


is a function assigning a value to every 


subset (also referred to as a coalition) S Q N. 

The game-theoretic literature generally focuses on revenue division: suppose 
that players have formed the coalition N, they must now divide the revenue 
v{N) among themselves in some reasonable manner. However, all of the stan¬ 
dard solution concepts for cooperative games require intimate knowledge of the 
structure of the underlying coalitional interactions. For example, suppose that 
a department head wishes to divide company bonuses among her employees in 
a canonically stable manner using the core — a division such each coalition is 
paid (in total) at least its value. In order to do so, she must know the value 
that would have been generated by every single subset of her staff. How would 
she obtain all this information? 

Indeed, it is the authors’ opinion that the information required in order to 
compute cooperative solution concepts (much more than computational com¬ 
plexity) is a major obstacle to their widespread implementation. 

Let us therefore relax our requirements. Instead of querying every single 
coalition value, we would like to elicit the underlying structure of coalitional 
interactions using a sample of m evaluations of v on subsets of N. To be more 
specific, let us focus on the most commo n learning-theoretic model: the prob¬ 
ably approximately correct (PAC) model Kearns and Vazirani Il994l |. Briefly, 
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the PAG model studies the following problem: we are given a set of points 
xi,...,Xm S R" and their values yi,. ■ ■ ,ym- There is some function / that 
generated these values, but it is not known to us. We are interested in finding a 
function /* that, given that Xi,..., x^ were independently sampled from some 
distribution P, is very likely (“probably”) to agree with / on most (“approxi- 
ma tely”) points sampled from the sa me distribution. 


Procaccia and RosenscheinI |2006 | provide some preliminary results on PAG 


learning cooperative games, focusing on simple games (this is a technical term, 
not an opinion!) — where v{S) G {0,1} for every S C N. Their results are 
mostly negative, showing that simple games require an exponential number of 
samples in order to be properly PAG learned (with the exception of the trivial 
class of unanimity games). However, the decade following the publication of 
their work has seen an explosive growth in the number of well-understood classes 
of cooperative games, as well as a better understanding of the computational 
difficulties one faces when computing cooperative solution concepts. This is 
where our work comes in. 


1.1 Our Contribution 

We revisit the connection be tween learning theory and coopera tive games, greatly 


expanding on the results of [Procaccia and RosenscheinI [2006 1. 


In Section 131 we introduce a novel relaxation of the core: it is likely (but, 
in contrast to the classic core, not certain) that a coalition cannot improve its 
payoff by working alone. Focusing on probable stability against likely deviations 
saves us a lot of computational overhead: our first result (Theorem [SH]) shows 
that any cooperative game is PAG stabilizable; that is, there exists an algorithm 
that will output a payoff division that is likely to be resistant against future 
deviations by sets sampled from a distribution 2?, given a polynomial number 
of sets sampled i.i.d. from the same distribution. What’s more, the payoff 
outputted is feasible: it is no more than v{N) if the core of the game is not 
empty; if the core of v is empty, then the total payoff will be no more than the 
minimum required in order to stabilize the game. In other words , this algorithm 
will pay no more than the cost of stabilitv lBachrach et al\ 2009j| of the game v. 

While coalitional stability is naturally desirable, understanding the under¬ 
lying coalitional dynamics is no less important. In Section 0] we ask whether or 
not classes of games are efficiently leamable; that is, is there a polynomial-time 
algorithm that receives a polynomial number of samples, and outputs an accu¬ 
rate hy pothesis with h i gh co nfidence. Our main results are that network flow 
[Maschler et a,l\ . boi,*!} Ghapter 17.9] are efficiently learnable with path 


games 

queri es (but not in general), and so ar e threshold task games iGhalkiadakis et aL . 

2010j| . and induced subgraph games Deng and PapadiinitriouL 19941. We also _ 

study /c-yector weighted voting gai nes lElkind et 20091. MG n^ lleong and Shoh^ 
200fil | , and coalitional skill games [Bachrach and RosenscheinI l2008l | . 


2 


































1.2 Related Work 


Aside from the closely related work of Procaccia and Rosenscheip 2006l| , there 
are several papers that study co aliti onal stability in uncer tain environments. 
Chalkiadakis and Boutilierl 2004| and lLi and Conitzen 2015] assume that coali¬ 
tion values are drawn from some unknown distribution, and we observe noisy 
estimates of the values. However, both papers assume full access to the coopera¬ 
tive game, whereas we assume that m independent samples are observed. Other 
works study coalitional uncertainty: coalition values are kn o wn, but agent par¬ 
ticipa tion is uncertain due to failures Bachrach et aZ.l . l2012aibl : lBachrach and Shah 
201.‘l] . 

Our work is a lso related to papers on eliciting and learning combinatorial val- 
uation functions IZinkevich et al\ . 2003 1 LahaieanHParl^ ^ 2004 1 Lahaieet_a^ 


20051 Balcan and Harvev . 201lllBalcan et oIT 20121 Badanidivuru et all 12012] . 

A player’s valuation function in a combinatorial auction is similar to a cooper¬ 
ative game: it assign a value to every subset of items (instead of every subset 
of players). This connection allows us to draw on some of the insights from 
these pape rs. For example, as we explain below, learnability results for XOS 
valuations [Balcan et al. . 20121 informed our results on network flow games. 


2 Preliminaries 

2.1 Cooperative Games 

A cooperative game is a tuple Q = {N,v), where N = {1,... ,n} is a set of 
players, and u : 2"' —^ K. is called the characteristic function of Q. When the 
player set N is obvious, we will identify Q with the characteristic function v, 
referring to v as the game. A game Q is called simple if v{S) € {0,1} for all 
S C N] Q is called monotone if v{S) < v(T) whenever S C T. One of the main 
objectives in cooperative games is finding “good” ways of dividing revenue: it 
is assumed that players have generated the revenue v{N), and must find a way 
of splitting it. An imputation for is a vector x G ffi." that satisfies effieieney: 
Sr=i = v{N), and individual rationality: Xi > u({*}) for every i € N. The 
set of imputations, denoted I{G), is the set of all possible “reasonable” payoff 
divisions among the players. Given a game G, the core of G is given by 

Core{G) = {x e I{G) \ VSCN: x{S) > v{S)}. 

The core is the set of all stable imputations: no subset of players S can deviate 
from an imputation x G Core{G) while guaranteeing that every i G S receives 
at least as much as it gets under x. 


2.2 PAC Learning 


We provide a b rief overview of the PAC learning model; fo r a far more detailed 
exposition, see Kearns and Vaziraiiil . Il994 ; Shashua , 2009] . PAC learning per¬ 
tains to the study of the following problem: we are interested in learning an 
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unknown function / : 2^ —>■ R. In order to estimate the value of /, we are given 
m samples {Si,vi),..., {Sm,Vm), where vj = f{Sj). Without any additional 
information, one could make arbitrary guesses as to the possible identity of /; 
for example, we could very well guess that f*{Sj) = Vj for all j G [m], and 0 
everywhere else. Thus, in order to obtain meaningful results, we must make 
further assumptions. First, we restrict / to be a function from a certain class of 
functions C: for example, we may know that / is a linear function of the form 
fi^) = J2ies Wi, but we do not know the values wi,, Wn. Second, we assume 
that there is some distribution V over 2^ such that Si,..., Sm were sampled 
i.i.d. from D. Finally, we require that the estimate that we provide has low 
error over sets sampled from V. 

Formally, we are given a function V : 2^ ^ R+, and two values e > 0 
(the accuracy parameter) and 5 > 0 (the confidence parameter). An algorithm 
A takes as input e, 6 and m samples, (Ai, r'(5'i)),..., {Sm,v{Sm)), taken i.i.d. 
from a distribution V. We say that A can properly learn a function f € C from 
a class of functions C (C is sometimes referred to as the hypothesis class), if 
by observing m samples — where m can depend only on n (the representation 
size), i, and — it outputs a function f*€C such that with probability at 
least 1 — d, 

JPr^[/(A) ^ nS)] < e. 

The confidence parameter 6 indicates that there is some chance that A will 
output a bad guess (intuitively, that the m samples given to the algorithm are 
not representative of the overall behavior of / over the distribution D), but this 
is unlikely. The accuracy parameter e indicates that for most sets sampled from 
V, f* will correctly guess the value of S. 

Note that the algorithm A does not know V; that is, the only thing required 
for PAC learnability to hold is that the input samples independent, and that 
future observations are also sampled from H. In this paper, we only discuss 
proper learning', that is, learning a function f € C using only functions from C. 

We say that a hnite class of functions C is efficiently PAC learnable if the 
PAC learning algorithm described above runs in polynomial time, and its sample 
complexity m is polynomial in n, i, and 

Efficient PAC learnability can be established via the existence of consistent 
algorithms. Given a class of functions C from 2^ to R, suppose that there is 
some efficient algorithm A that for any set of samples {Sj,Vj)Jh^ is able to 
output a function f* G C such that /*(<S'j) = Vj for all j G [m], or determine 
that no such function exists. Then A is an algorithm that can efficiently PAC 
learn C given m > ^ log samples. Conversely, if no efficient algorithm exists, 
then / cannot be efficiently PAC learned from C. 

To conclude, in order for a class C to be efficiently PAC learnable, we must 
have polynomial bounds on the sample complexity — i.e. the number of samples 
required in order to obtain a good estimate of functions in C — as well as a 
poly-time algorithm that finds a function in C which is a perfect match for the 
samples. We observe that in many of the settings described in this paper, the 
sample complexity is low, but finding consistent functions in C is computation- 
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ally intracta ble (it would entail that P = NP or that NP = RP). In contrast, 
the result of lProcaccia and Rosenscheinl 2006l| establishes lower bounds on the 
sample complexity for PAC learning monotone simple games, but there exists a 
simple algorithm that outputs a hypothesis consistent with any sample. 

When the hypothesis class C is finite, it suffices to show that log|C| is 
bounded by a polynomial in order to establish a polynomial sample complexity. 
In the case of an infinite class of hypotheses, this bound becomes meaningless, 
and other measures must be used. When learning a fun ction that takes values in 
{0,1}, the VC dimension Kearns and VaziraniL 1994 captures the learnability 
of C. Given a class C, and a list <S of m sets Si,, Sm, we say that C shatters 
S if for every b G {0,1}™ there exists some vi, G C such that v{h){Sj) = bj for 
all j. We write 


VCdim(C) = max{m | 3S, |iS| = m,C can shatter 5}. 

When learning hypotheses that output real numbers (as opposed to functions 
that take on values in {0,1}), the notion of pseudo dimension is used in order 
to bound the complexity of a function class. Given a sample of m sets S = 
Si,...,Sm Q N, we say that a class C shatters S if there exist thresholds 
ri,... ,rm G R such that for every b G {0,1}™ there exists some Vb G C such 
that Vb{Sj) > rj if bj = 1, and Vb{Sj) < rj if bj = 0. We write 

Pdim{C) = max{m | 35 : |5| = m,C can shatter S}. 

It is known [Anthony and Bartlett! I2n09l | that if Pdim(C) is polynomial, then 
the sample complexity of C is polynomial as well. 


3 PAC Stability 

In the context of cooperative games, one could think of PAC learning as the 
following process. A central authority wishes to find a stable outcome, but 
lacks information about agents’ abilities. It solicits the independent valuations 
of m subsets of agents, and outputs an outcome that, with probability 1 — 5, is 
likely to be stable against any unknown valuations. 

More formally, given e G (0,1), we say that an imputation x G I{Q) is 
e-probably stable under P if 

Pr [xiS) > ?;(5)] >l-s. 

An algorithm A can PAC stabilize a class of fnnctions C from 2^ to R if, 
given e,S G (0,1), and m i.i.d. samples {Si,v{Si)),..., {Sm,v{Sm)) of some 
V G C, with probability 1 — 5, A outputs an outcome x that is e-probably 
stable under P. There is an important subtlety here: suppose that we know 
the value v{N); by grossly overpaying the agents we could easily stabilize any 
game. Under the mild assumption of monotonicity, we can pay each i G N a 
value of v{N), which results in a trivially stable payoff division. In addition to 
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PAC stability, we must ensure that the total payment to agents is no more than 
v{N) if the core is not empty; if the core of Q is not empty, then the output of 
our algorithm should be no more than the minimal payment required in order 
to stabilize the game. Formally, given a cooperative game Q = {N, v) and a 
non-negative constant A e M+, we define Ga = wher e va(S) = v(S) 

for al l S C and va{N) = v{N) -|- A. The cost of stability iBachrach et al. 
200^ is 

CoS{G) = min{A e IR+ | CoreiGA) ^ 0}- 


If Core{G) ^ then CoS{G) = 0; indeed, the larger the value of CoS{G), the 
greater the subsidy required in order to stabilize G- The following theorem 
establishes the poly-time PAC stabilizability of cooperative game^ with formal 
bounds on the total payoff provided by the outputted imputatioro. 


Theorem 3.1. There exists an algorithm that, given a cooperative game G = 
{N,v) and m i.i.d. samples {Si,v{Si)),..., {Sm,v{Sm,)) sampled from a dis¬ 
tribution V, where m is polynomial in i,log|, outputs a payoff division x* 
that (e,(5) PAC stabilizes G- Furthermore, x*{N) < CoS{G); in particular, if 
Core{G) ^ 0 then x*{N) < v{N). 


Proof. Let us first consider the problem of computing the cost of stability. It 
can be expressed as the following linear optimization problem: 


min Xi (I) 

i=l 

s.t. ^ cci > n(S') ySCN 

ies 

If the value of the optimal solution to o is at most v{N) then the core of G is not 
empty. Now, clearly m is not poly-time computable, since the number of con¬ 
straints is exponential. However, given m i.i.d. samples {Si,v{Si)), ..., (S'm, f (S'm)), 
consider the following optimization problem: 

n 

min (2) 

i=l 

s.t. '^Xi>v{Sj) Vje {!,..., m} 


Unlike HI), ([2]) is poly-time computable in m and n. Moreover, if x* is the opti¬ 
mal solution to then the value of x*{N) is no greater than the value of the 
optimal solution to ([T]), as m imposes more constraints on the target function 
than @. To see why m outputs a payoff division that PAC stabilizes G, we 
observe that the problem of finding a PAC stable payoff division is equivalent to 
the problem of learning an unknown linear function x such that — ^('5') 

for all S C N: since (H)) is a consistent algorithm, we know that it outputs a 
PAC approximation of x. □ 

^The authors would like to thank Amit Daniely for pointing out some of the basic ideas 
outlined in the proof of Theorem 13.11 
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Theorem ED states that any cooperative game is PAC stabilizable, irre¬ 
spective of its own learnability guarantees. While the proof of Theorem 13.11 
is simple, it has several important implications. The immediate implication of 
Theorem 13.II is overcoming the computational complexity of finding stable out¬ 
comes. If one is willing to forgo guaranteed stability, it is possible to find payoff 
divisions that are likely to be stable. Second, PAC stabilizability is a stability 
concept that is founded on observational data: one does not need to know any¬ 
thing about the underlying cooperative game in order to use partial observations 
of its values to achieve stability. Finally, the underlying assumptions about the 
PAC stabilizability concept are reasonable from a practical perspective. There 
are several works that study the stability of cooperative games when certain 
restrictions on the coalitions that may form are in place. T he most notabl e 
example of such a line of work are Myerson interaction graphs iMversoi] 1977 1; 
in addition to the game Q = {N, v) , we are given a connected, undirected graph 
P = {N, E), whose nodes are the players. A coalition S C N may form only if 
the subgraph induced by 5" in P is connected. Several works have shown that 
under certain assumptions on the structure of the M yerson int e ractio n graph, 
graph restricted coalit ional games may be s t abilized Dernangd 2004 1. exhibit 


a low cost of stability iBousauet et al\ 120151: iM eir et al l 120131. and have sta¬ 


ble ou tcomes found in polynomial time IChalkiadakis et ali 20121; IZick et al. 


2012] . Rather than assuming a certain underlying structure on the game, the 


PAC stabilizability approach observes the actual coalitions formed, and assumes 
that past events are a good prediction of future agent coalition formation habits; 
thus, if all observed coalitions were, say, of size at most 2, it is likely that the 
PAC stable payoff division would be stable against pairwise deviations. 


4 PAC Learnability of Common Classes of Co¬ 
operative Games 


In what follows, we explore the PAC learnability of common classes of cooper¬ 
ative games. Some of our computational intractability results depend on the 
assumption that NP ^ RP, where RP is the class of all languages for which 
there exists a poly-time algorithm that for every instance I, outputs “no” if / 
is a no instance, and “y es” with probability > ^ if it is a “ yes” instance. It is 
believed that NP ^ RP Hemaspaandra and Qgihara , 2002] . 


4.1 Network Flow Games 

A network flow game is given by a weighted, directed graph P = (V^E), with 
w : E R+ being the weight function for the edges. Here, N = E, and 
v{S) = flow{T\s,w,s,t), where flow denotes the maximum s-t flow through P, 
where edge weights are given by w, and s,t € V. 

We begin by showing that a similar class of functions is not efficiently learn- 
able. We define the following family of functions, called min-sum functions 
which are defined as follows: there exists a list of n-dimensional, non-negative 
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integer vectors Wi,...,Wfe. For every S C N, f{S) = min|g[fc] Wf(S'), where 
w^(5') = w/t If fc = 1, we say that the min-sum function is trivial. We 


note that iBalcan et al\ [201^ study the learnability of XOS valuations, where 


the min is replaced with a max. 

We dehne /c-min-sum to be the class of min-sum functions defined with k 
vectors. 


Lemma 4.1. The set of k-min-sum functions is not efficiently PAC learnable 
unless NP = RP whenever fc > 3. 


Proof. Our proof relies on the fact that CNF form ulas with more than tw o 
clauses are not efficiently learnable unless NP = RP Pitt and Valiantl 
Given a set of variables xi,... ,Xn, let us define a set of players 


N = {xi,xi,.. .,Xn,x„,y}. 

Given a fc-clause CNF formula of the form (j) = Ci, where Cg is a disjunc¬ 
tive clause containing literals from N (excluding the variable y), we define the 
following fc -I- 1-min-sum function {0,1}: 

U{S) = Tnm{i\c,ns\)jL„\sn{y}\}. 

In order to have a value of 1, S' must intersect with every Cj on at least one 
player; otherwise, f^{S) = 0. Moreover, S must contain y. In what follows, we 
will take truth assignments on a;i,..., and map them to subsets of players in 
N, ensuring that f^{S) = 1 if and only if the truth assignment from which S was 
generated from an assignment satisfying (j). We note that it is possible that one 
cannot generate a truth assignment for (f from all sets S for which f^{S) = 1; 
for example, f(i>{N) = 1, but this is completely uninformative. Given a truth 
assignment T for (p, we define its set equivalent to be 5'r, where St contains 
Xi if Xi is true in T, otherwise St contains Xi. Also, St contains y. Thus, 
f^{ST) = 1 if and only if T satisfies (j>. 

Now, given a set of inputs from f {Ti,(p{Ti)),... we write 

T = {Ti,..., Tm}. For every T £ T, we add to the input (^t, <(>(T)), and 
{St \ {y},0). The sampled point {St \ {y},0) is added to ensure that the 
“importance” of y in the definition of is noted by any algorithm that is 
consistent with the input. In other words, we can “pretend” that the input of 
truth assignments to f is an input of an unknown fc -|- I-min-sum function as 
defined above, and use it to define a CNF formula that is consistent with f. 
Moreover, if the original input m truth assignments, the number of points we 
provide to the algorithm learning fcj, is 2m. 

Suppose that there exists some consistent poly-time algorithm A for fc -|- 1- 
min-sum functions; that is, given a list of m inputs from an unknown fc -|- 1- 
min-sum function /, A outputs a list of non-negative vectors w*,..., that 
define a fc -I- 1-min-sum function that is consistent on the inputs, and does 
so in time polynomial in n,i, and log^. Then, if we input to this algorithm 
the inputs defined above, it will output /* defined by wj',.. . such that 
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/*(<S't^) = ftpiSTi) on the inputs we designed. Using /*, we now show how one 
can reconstruct a CNF formula with at most k clauses, such that = (j){T) 

for all T G T- 

Let us define 71 to be the set of truth assignments in T that do not satisfy 
(f) and 7+ to be the set of truth assignments in T that do. We assume that we 
observe both satisfying and non-satisfying truth assignments (otherwise we can 
just output a trivial always true or always false CNF). 

First, we claim that there must exist at least one weight vector in the descrip¬ 
tion of /* with the value of y set to some positive quantity. Suppose that all of 
the vectors Wj,..., have the value of y set to 0; then for any truth assign¬ 
ment T gT that satisfies (j), we have {St, 1 ) and {St \ {y}, 0); but since none of 
the weight vectors of /* assign a positive weight to y, /*(5'r) = /*(5 't \ {?/})j 
a contradiction to the consistency of /*. We conclude that there exists at least 
one vector w| that has a positive weight assigned to y. 

Furthermore, we claim that there must exist at least one vector in the de¬ 
scription of /* with the weight of y set to 0. This holds since for every T G T-, 
there must be at least one vector w| that takes a value of 0 on St, to maintain 
consistency: vectors with a positive weight on y have a positive weight for St, 
and if all of the vectors defining /* assign St a positive weight, then /*(S't) = 1 , 
a contradiction to consistency. 

Let us take the set of vectors who assign a weight of 0 to y, call that set 
W*; since there exist some vectors that assign a positive weight to y, |>V*| < k. 
Furthermore, since there exist some vectors who assign a weight of 0 to y, 
W* ^ 0. For every w| G W*, we define a clause such that is a disjunction 
of all literals in the support of w|; in other words, if the weight of Xi in w| is 
positive, then Xi is in C^, and if the weight of Xi is positive then Xi is in C^. 

Let us write the resulting CNF formula to be (p*. First, since |yV*| < k, the 
number of clauses in cp* is at most k. Second, if (p{T) = 1, then /*(S't) = 1, and 
in particular, w}{St) > 0 for all w| G W*; thus, (p*{T) = 1 as well. Finally, 
if (p{T) = 0) then f{ST) = 0, and there is at least one weight vector in W* 
that has w{^{St) = 0. The corresponding clause is not satisfied by T, and in 
particular, (p*{T) = 0. This concludes the proof. □ 

Theorem 4.2. Network flow functions are not efficiently learnable unless NP = 
RP. 

Proof. Our proof reduces the problem of learning min-sum functions to the 
problem of learning network flow functions. Given a min-sum target function 
/, defined by Wi,..., (with k > 3), and a distribution V over samples of N, 
we construct the directed graph F = (U, E) as follows (see Figure [T]). 

For every weight vector wi = {wu ,..., Wni), we define vertices £,£ + 1, and 
n edges from £ to £+1, where the capacity of the edge eu is wu. Finally, we 
denote the vertex fc-f 1 as the target t, and the vertex 1 as the source s. Given a 
set S' C TV, we write Es = {eu | ^ G [fc], i G S}. We observe that the flow from s 
to t in the constructed graph using only edges in Es equals /(S); in other words, 
flowY-{Es) = f{S) for all S C TV. Now, given a probability distribution V over 
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Figure 1: An example of the reduction from /c-min-sum functions to network 
flow games, described in Theorem 14.21 Note that we rename the nodes 1 and 4 
to s and t, respectively. The description of the original 3-min-sum function is 
given in (•). 


2^, we define a probability distribution over E as follows: ¥tx>'[Es\ = Pr-D[5'] 
for all 5" C TV, and is 0 for all other subsets of E. 

We conclude that efficiently PAC learning /lowp under the distribution V is 
equivalent to PAC learning /, which cannot be done efficiently by Lemma |4.1I 

□ 

Learning network flow games is thus generally a difficult task, computation¬ 
ally speaking. In order to obtain some notion of tractability, let us study a 
learning scenario, where we limit our attention to sets that constitute paths in 
P. In other words, we limit our attention to distributions such that if 
assigns some positive probability to a set S, then S must be an s-t path in P. 
One natural example of such a distribution is the following: we make graph 
queries on P by performing a random walk on P until we either reach t or have 
traversed more than \V\ vertices. 

Given a directed path p = {wi, ..., iCfe), we let w{p) be the flow that can 
pass through p; that is, w{p) = minegpWe- 

Theorem 4.3. Network flow games are efficiently PAC learnable if we limit T) 
to he a distribution over paths in P. 

Proof Given an input ((pi, m),..., {pm,Vm)), we let We = maxj.,e£pj vj. 

We observe that the weights {we)eGE are such that w(pj) = 'w{pj) for all 
j € [m\. This is because for any e G pj, We > Vj, so minegp^. We > Vj. On the 
other hand, We < We for all e G E, since We > vj for all Vj such that e G Pj, and 
in particular We > max^^gp^. vj = We- Thus, minegp^ We < miuggp^. We = vj. 
In other words, by simply taking edge weights to be the maximum flow that 
passes through them in the samples, we obtain a graph that is consistent with 
the sample in polynomial time. 

Now, suppose that the set of weights on the edges of the graph according 
to the target weights We is given by {oi,..., a^}, where k < n. Then there are 
(fc -I- 1)" < (n -I- 1)" possible ways of assigning values {wflje^E to the edges in E. 
In other words, there are at most (n + 1)" possible hypotheses to test. Thus, in 
order to (e, (5)-learn {we)eGE, where the hypothesis class C is of size < (n -I- 1)", 
we need a number of samples polynomial in i, log | and log \C\ G 0{n log n). □ 
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4.2 Threshold Task Games 

In Threshold task games (TTG) each player i G N has an integer weight Wi; there 
is a finite list of tasks T, and each task t G T is associated with a threshold q{t) 
and a payoff V(t). Given a coalition S' C TV, we let T\s = {t & T \ q{t) < w{S)}. 
The value of S is given by v{S) = max^gT-jg V{t). In other words, v{S) is the 
value of the most valuable task that S can accomplish. Weighted voting games 
(WVGs) are the special case of TTGs with a single task, whose value is 1; that 
is, they describe linear classifiers. 

Without loss of generality we can assume that all tasks in T have strictly 
monotone thresholds and values: if q{t) > q{t') then V{t) > V{t'). Otherwise, 
we will have some redundant tasks. For ease of exposition, we assume that there 
is some task whose value is 0 and whose threshold is 0. Let Cttg{Q) be the class 
of fc-TTGs for which the set of task values Q C R of size k. The first step of 
our proof is to show that C^^g{Q) is PAG learnable. 

Lemma 4.4. The class C^^g{Q) is PAC learnable 

Proof. In order to show this, we first bound the sample complexity of Cffg{Q). 
We claim that Pdim{C^fg{Q)) < (fc + l)(n + 2). The proof relies on the fact that 
the VG dimension of linear functions is n + 1. 

Assume by contradiction that there exists some S of size L, where L = 
{k + l)(n + 2), and some values n,... ,rL G R+ such that for all b G {0,1}^ 
there is some TTG /b G C^fg{Q) such that fh{Sj) > rj when bj = 1, and 
fb{Sj) < rj when bj = 0. We assume that 0 < ri < ••• < r^. Let us 
write the task set to T = {ti,... ,tk}, each with a value V{te) G Q; we order 
our tasks by increasing value. Now, by the pigeonhole principle, there exists 
some task te and some j* such that ,..., rj.+(„+i) G [V{ti),V{ti+i)). In 
particular, if we write S* = {Sj *,..., then for every b G {0,1}^, 

there is some /b G C'f^g{Q) (defined by an agent weight vector Wb, and task 
thresholds Tf,...,T^), such that for all Sj G S* , if fh{Sj) > rj it must be 
that fh{Sj) > Vi, i.e., Wb(5'j) > T^. If fb{Sj) < rj then Wb(5'j) < T^. Thus, 
((wb, Tj'’))b is a set of n-dimensional linear classifiers that is able to shatter a 
set of size n + 2, a contradiction. To conclude, Pdim{C^fg{Q)) < {k + l)(n + 2), 
which implies that the sample complexity for PAG learning TTGs is polynomial. 

It is easy to construct an efficient algorithm that is consistent with any 
sample from Cf^g{Q) via linear programming. 

Given the inputs, 

(*S'i, r:! ) , . . . , , VjYif 

let us write the distinct values ai,..., in vi,..., and create £ tasks with 
values V{ti) = ai ,..., V(t() = a£. We observe that since (Ai,?;!),..., (Sm,Vm) 
represent outputs of a function in C^fg{Q), it must be the case that i < k. We 
further assume that V{ti) < V{t 2 ) < ■ ■ ■ < V{te). We also define te+i to be 
an auxiliary task that has q{t(,+i) = V{ti+i) = oo. Next, we obtain weights for 
the players and thresholds for the tasks. For every set Sj it must be the case 
that if Vj = V{tr), then w{Sj) > q{tr), but Sj does not have sufficient weight to 
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complete tr+i- Let a : [m] [£] be the mapping that, for each sample {Sj,Vj), 

maps Sj to the task that it completed; i.e. the task tr for which vj = V(tr). 
In order to find the correct set of weights for tasks and agents we use linear 
programming; we require that every set Sj has a weight of at least 
but no more than g(tcr(:/)+i) (we add a “dummy” variable whose weight 

is extremely high to make this requirement hold for the -^-th task). Since we 
cannot explicitly code the condition w{Sj) < q{ta-(j)+i) into an LP, we add a 
tolerance parameter 2“”. We repeatedly run LP ([3]) with r — 0,1, 2,... until 
a feasible solution is found. Thus, we will need to rerun LP ([3]) at most the 
number of bits required to represent the values in the original TTG. 

find: w G R" , q G R+ (3) 


w{Sg) > q{ta{j)) 

V} G [m] 

w{Sj) < q{ta{j)+i) + 2"’' 

\/j G [m] 

Wi >0 

Vi G [n] 

qg > 0 

Vj e [£] 


The linear feasibility program ([3]) has n + £ variables and 2m constraints, and is 
thus solvable in polynomial time. Moreover, a feasible solution exists; namely, 
the one that corresponds to the weights in the original TTG. Thus, there is an 
efficient, consistent algorithm for C^^glQ). □ 

Let Cffg be the class of TTGs with k tasks; the following lemma shows 
that if we take a sufficient number of samples, a game v G C^f.g can be PAG 
approximated by a game ^ ^ttg (Q), where Q are the observed values of v. 

Lemma 4.5. Given m > log independent samples 

from V G C^^g; let Q — event PTSr^v[v{S) ^ Q] < £ occurs 

with probability at most 1 — (5. 

Proof. First, recall that for every TTG in we have k different task values 


and any set of observed samples will show some Q C {Vi, ..., 14}; thus, there 
can be at most 2^ observed sets of values from the samples. Let us write P™ 
to be the probability distribution from which our samples are taken. Given 
Q = let Yg{Q) be the event that Prs..., 25 [n(S') ^ Q] < £. We need 

to bound the probability (over samples from I?"*) that the event -iYg{Q) occurs. 
Let Sm{Q) be the set of all sequences sampled from 2?™ for which the observed 
set of values is Q. 

First, we note that if Si, ..., Sm generate the values Q, such that -^Y^{Q) 
occurs, then Prs.^x)[u(S') G Q] 4 1 — e. Next, note that if {Si,... , Sm) G Sm{Q) 
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then v{Sj) € Q for all j = 1,..., m. Thus, 


Py[{Si, SmiQ)] < n Pr HS) G Q] < (1 - er, 

t=l 

which by our choice of m, is at most 

- A A 

Putting it all together, 

Prhn(Q)]= ^ Pr[(5i,...,,S™)G5™(Q)] 

Q-^YeiQ) 



which concludes the proof. □ 

Using the two lemmas, we are now ready to prove that the class of fc-TTGs, 
Cf^g, is PAG learnable. 

Theorem 4.6. Let C^^g be the elass of k-TTGs; then C^^g is PAC learnable. 

Proof Sketch. Let {Si,v{Si)),..., {Sm,v{Sm)) be our set of samples. Accord¬ 
ing to Lemma 14.51 we can choose m such that with probability > 1 ~ f, 
PrSr...-Db(<S') ^ Q\ < |. We let v be the TTG v with the set of tasks re¬ 
duced to Q; that is v{S) = v{S) if v{S) G Q, and is the value of the best task 
that S can complete whose value is in Q otherwise. Thus, we can pretend that 
our input is from 6 ^ttg (Q). According to Lemma [4.41 if m is sufficiently 
large, then with probability > 1 — | we will output some v* G C^tg(Q) such that 
Prs~-D[tl(*S') = u*(S')] > 1 — Thus, with probability > 1 — ( 5 , we have that 
both Prs.^x)[il(<S') = u*(S')] > 1 — | and Prs...,x)[u(S') = u(S')] > 1 — |. We claim 
that V* PAG approximates v. Indeed, 

Pr MS) jk v*{S)] = Pr MS) ^ v*{S)) A (viS) = v{S))] 

+ MHS)^v*{S)) A {v{S)^v{S))\ 

b'^ U 

< lj^[v*{S) ^ u(5)] + Pj^HS) + u(^)] < e 


□ 
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4.3 Induced Subgraph Games 


An induced subgraph game (ISG) lOeng and Papadimitriou 1994 1 is given by a 
weighted graph F = {N, E), where for every pair i,j £ N, Wij G Z denotes the 
weight of the edge between i and j. We let W be the weighted adjacency matrix 
of r. The value of a coalition S C N is given by v{S) — i-®- 

the value of a set of nodes is the weight of their induced subgraph. 


Theorem 4.7. The class of induced subgraph games is efficiently PAG learn- 
able. 


Proof. Let W be the (unknown) weighted adjacency matrix of F. Let us write 
es to be the indicator vector for the set S in K". That is, the Tth coordinate of 
es is 1 if z G S, and is 0 otherwise. We observe that in an ISG, v{S) — e^Wes- 
In other words, learning the coefficients of an ISG is equivalent to learning a 
linear function with variables (one per vertex pair) , which is known to 

have polynomial sample complexity [Anthony and Bartlett . 2009|| . 

Now, given observations (Si^vi),..., {Sm,Vm), we need to solve a linear 
system with m constraints (one per sample), and variables (one per vertex 

pair, as above), which is solvable in polynomial time. 


Find:(i(;i,i/)i,j/gAr (4) 

S.t. ^ Wi^i> =Vj Vj = I,..., TO 

The output of (g]) is guaranteed to be consistent, and since a solution exists 
(namely, IT), we have a straightforward consistent poly-time algorithm, and 
conclude that the class of ISGs is efficiently PAG learnable. □ 


4.4 Additioual Classes of Cooperative Games 

Before we conclude, we present a brief overview of additional results obtained 
for other classes of cooperative games. 


Vector Weighted Voting Games : In weighted voting games (WVGs), 
each player i £ N has an integer weight wp, the weight of a coalition S' C V is 
defined as w{S) = ^ coalition is winning (has value 1) if w{S) > q, 

and has a value of 0 otherwise. Here, g is a given threshold, or quota. The class 
of vector WVGs is a simple generalization of weighted voting games given by 
Elkind et al. |[2^. A vector WVG of degree k (or A:-vector WVG) is given by 


k WVGs: ("Wi; gi),..., {wk', qk)- A set S C V is winning if it is winning in every 
one of the k WVGs. 

Learning a weighted voting ga me is equivalent to learning a separating hyper¬ 
plane, which is known to be easy Kearns and Vazirani Il994| . However, learn¬ 
ing fc-vector WVGs is equivalent to learning the inte rsection of fc-hyper p lanes , 


which is known to be WP-hard even when k = 2 Alekhnovich et ail . l2004 


Blum and Rivest , 1992 : Klivans et al. LE^. Thus, A:-WVGs are not efficiently 


PAG learnable, unless P = NP. 
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MC-nets: Marginal Contribution Nets (MC-nets) [leong and Shohaml . 12005 1 

provide compact representation for cooperative games. Briefly, an MC-net is 
given by a list of rules over the player set N, along with values. A rule is a 
Boolean formula (j)j over N, and a value Vj. For example, r = Xi /\X2 /\ ~^Xz —> 7 
assigns a value of 7 to all coalitions containing players 1 and 2, but not player 3. 
Given a list of rules, the value of a coalition is the sum of all values of rules that 
apply to It. PAG learning MG-nets can b e reduced to PAG learning o f DNF 
formulas, which is known to be intractable Khvans and ServedioL 12001 1. 

More formally, let </> = VjLi be a DNF formula, where Ci,..., Cm are 
conjunctive clauses over a set of n variables. The reduction is somewhat similar 
to the one used In the proof of Theorem l4.2l given the set of variables xi,..., 
we define our player set to be = {1,..., n}. We perform the following trans¬ 
formation: for every clause Cj In the DNF we say that a set S C N satisfies the 
clause if f G S' whenever Xi G Cj, and f ^ S if ^Xi G Cj. We dehne a rule of 
the form Cj —>■ 1; that Is, if S C A^ satisfies the clause It Is awarded one point. 
We can associate a truth assignment for the variables Xi,. ■ ■ ,Xn with a set in 
N in the natural way: if f G S Iff a;^ is set to true. Thus, if a truth assignment 
satisfies the DNF (p, it has a positive value under the MC net; otherwise, its 
value Is 0. To conclude, any algorithm that properly learns MC-nets will be 
easily transformed Into one that properly learns DNF formulas. 


Coal itional Skill Games: Coalltlonal Skill Games (CSGs) [Bachrach and Rosenschein 
2008l| are another well-studied class of cooperative games. Flere, each player i 
has a skill-set additionally, there Is a list of tasks T, each with a set of 
required skills k*. Given a set of players S C N, let K{S) be the set of skills 
that the players in S have. Let T{S) be the set of tasks {t £ T \ Kt Q K{S)}. 

The value of the set T{S) can be determined by various utility models; for ex¬ 
ample, setting v{S) = |T(S')|, or assuming that there is some subset of tasks 
T* QT such that v{S) = 1 iff T* C T{S); the former class of CSGs Is known 
as conjunctive task skill games (CTSGs). 

PAG learnability of coalltlonal skill games Is generally computationally hard. 

This holds even If we make some simplifying assumptions; for example, even if 
we know the set of tasks and their required skills In advance, or if we know the 
set of skills each player possesses, but the skills required by tasks are unknown. 

However, we can show that CTSGs are efficiently PAG learnable if player skills 
are known. 


5 Discussion 


Our work Is limited to finding outcomes that are likely to be stable for an un¬ 
known function. However, learning approximately stable outcomes Is a promis¬ 
ing research avenue as well. Such result s naturally relate approxim ately stable 

outcomes - — such as the e and lea st core Peleg and Sudh61teil . l2007l| . or the cost _ 

of stability Bachrach et aL , 2009l| —with PM AC learning algorithms Balcan and Harvev 
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12011 1 , which seek to approximate a target function (M stands for “mostly”) with 
high accuracy and confidence. 

This work has focused on the core solution concept; however, learning other 
solution concepts is a natural extension. While some solution concepts, such as 
the nucleolus or the approximate core variants mentioned above, can be natu¬ 
rally extended to cases where only a subset of the coalitions is observed, it is less 
obvious how to extend solution concepts such as the Shapley value or Banzhaf 
power index. These concepts depend on the marginal contribution of player i to 
coalition S, i.e., ?;(5'U{i}) —v{S). Under the Shapely value, we are interested in 
the expected marginal contribution when a permutation of the players is drawn 
uniformly at random, and i joins previous players in the permutation. Accord¬ 
ing to Banzhaf, S is drawn uniformly at random from all subsets that do not 
include i. Both solution concepts are easy to approxima te if we are allowed to 
draw coalition values from the appropriate distribution Bachrach et al. . 2010l| 


— this is a good way to circumvent computational complexity when the game 
is known. It would be interesting to understand what guarantees we obtain for 
arbitrary distributions. 

Finally, other models of cooperative behavior woul d naturally extend to a 
learning environment. For example, in hedonic games Aziz and Savani . 2016l| . 
each player reports a complete preference order over coalitions — either ordinal, 
e.g. “player i prefers coalition S to coalition T, or cardinal, i.e. each player 
assigns a numerical score to each coalition. Given players’ preferences over 
coalitions, our goal is to find a coalition structure (i.e. a partition of players into 
groups), that satisfy certain notions of stability or fairness. One such solution 
concept is Nash stability, a partition tt is Nash stable, if no player can leave the 
coalition it belongs to under tt, and benefit by moving to another existing group 
under tt. The same line of investigation raised in this work can be applied to the 
hedonic setting: given a set of coalitions and some information about them (e.g. 
the value that their members assign to them, their order of preference according 
to the players etc.), can we find a partition of the agents that will likely satisfy 
certain fairness properties? 
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